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1. INTRODUCTION 

In recent years, the control methodology for robotic systems has been widely developed not only in 
practical applications [1, 2], but also in theoretical analysis [3-6]. The main challenges of the control design 
have been considered, such as robust adaptive control problem, motion/force control, input saturation and full 
state constraints [7, 8] and the path planning problem [9]. Several control techniques have been employed for 
manipulators to tackle the issue of input saturation by adding more terms into the designed control input 
considering the absence of input Constraint [4, 5, 10-13]. In [4], authors proposed a new reference of control 
system due to the input saturation. The additional term world be computed based on the derivative of 
previous Lyapunov candidate function along the state trajectory under the control input saturation [4]. 

Furthermore, authors in [5] give a new approach to address the input constraints as well as 
combining with handling the disturbances. The proposed sliding surface was employed the Sat function of 
joint variables. In order to realize the disadvantage of state constraints in manipulator, the authors in [7, 8] 
proposed the framework of Barrier Lyapunov function and Moore-Penrose inverse, Fuzzy-Neural Network 
technique. The equivalent sliding mode control algorithm was designed then the boundedness of control 
input was estimated. The advantage of this approach is that input boundedness absolutely adjusted by 
selecting several parameters. 

The work in [10-13] presents a technique to implement the input constraint using a modified 
Lyapunov Candidate function. Because of the actuator saturation, the Lyapunov function would be added 
more the quadratic term from the difference between the control input from controller and the real signal 
applied to object. The control design was obtained after considering the Lyapunov function derivative along 
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the system trajectory. However, these aforementioned traditional nonlinear techniques have several 
drawbacks, such as difficulties in finding equivalent Lyapunov function, dynamic of additional terms 
[7, 8, 10-13]. Optimization Technique using GA (genetic algorithm), PSO (particle swarm optimization) 
were adressed to solve the papth planning problem [9]. The MPC (model predictive control) solution, 
which is the special case of optimal control design, has been investigated for linear motor not only online 
min-max technique in [14, 15] but also offline algorithm in [16]. In order to consider for robot manipulators. 
Optimal control algorithm obtains the control design that can tackle the input, state constraint based on 
considering the optimization problem in presence of constraint. An asymptotic optimal control design was 
presented in [3] by solving directly the Riccati equation in linear systems. However, it is difficult to find the 
explicit solution of Riccati equation as well as partial differential HJB (Hamilton-Jacobi-Bellman) equation 
in general case. The approximate/adaptive dynamic programming (ADP) has been paid much attention for 
optimal control problem in recent years because it is necessary to solve not only Riccati equation for linear 
systems but also HJB equation for nonlinear systems. Thanks to Kronecker product technique, authors in [17] 
proposed the online solution for linear systems without the knowledge of system matrix based on the least- 
squares solution from acquisition of a sufficient number of data points. In [18], Zong-Ping Jiang et al. 
extend the above online solution to obtain the completely unknown dynamics by means that does not depend 
on either matrix A or matrix B of linear systems. The fact that Riccati equation was considered in more detail 
in the computation problem as well as data acquisition. Moreover, the exploration noise on the time interval 
was mentioned in proposed algorithm [18]. Instead of the approach of employing Kronecker product for the 
case of linear systems, the neural network approximation was mentioned for cost function to implement 
online adaptive algorithm on the Actor/Critic structure for continuous time nonlinear systems [19]. 

However, the proposed algorithm required the knowledge of input-to-state dynamics to update the 
control policy as well as persistent condition was not considered [19]. The weight parameters in neural 
network were tuned to minimize the objective in the least-squares sense [19]. The theoretical analysis about 
convergence of cost function and control input in adaptive/approximate dynamic programming (ADP) was 
the extension of the work in [20]. Thanks to the theoretical analysis about the neural network approximation, 
authors in [21] presented the novel online ADP algorithm which enables to tune simultaneously both actor 
and critic neural networks. The weights training problem of critic neural network (NN) was implemented by 
modified Levenberg-Marquardt algorithm to minimize the square residual error. Moreover, the tuning of 
weights in actor and critic NN depend on each other to obtain the weights convergence. It is worth noting that 
the persistence of excitation (PE) condition need to be satisfied and Lyapunov stability theory was employed 
to analysis the convergence problem [21]. Extension of the work in [21], based on the analysis of 
approximate Bellman error, the proposed algorithm in [22] enables to online simultaneously implement 
without the knowledge of drift term. In [23], the identifier along with adaptation law can be described using a 
Neural Network to approximate the dynamic uncertainties of nonlinear model. An extension using special 
cost function has been proposed in [24, 25] to enable handling of input constraint. The framework of ADP 
technique and classical sliding mode control was presented to design the optimal control for an inverted 
pendulum [26]. However, the effectiveness of ADP has been still not considered for a robot manipulator in 
aforementioned researches. This work proposed the control algorithm combining exact linearization, Robust 
Integral of the Sign of the Error (RISE [3]) and ADP technique for manipulators in absence of holonomic 
constraint. This ADP technique was implemented using simultaneous tuning method to satisfy the weight 
convergence and stability. 


2. DYNAMIC MODEL OF A ROBOT MANIPULATOR AND CONTROL OBJECTIVE 
Consider the following robot manipulator without constraint: 





M (q)4+C(q,.4)g+G(q)+F(g)+7,0=t (1) 

Several appropriate assumptions [3] will be considered to develop the control design in next 
chapters. 

Assumption 1. The inertia matrix M(q) is symmetric, positive definite, and guarantees the 
inequality ve eR as follows: 


2 
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where “ER mgeR | - | 


standard Euclidean norm, respectively. 


is a known positive constant, a known positive function, and the 


Assumption 2. The relationship between an inertia matrix M(q) and the Coriolis matrix C(q.) can 
be represented as follows: 


E"(M(q)-2C(g¢,g)é=0 VEER". (3) 


It should be noticed that this manipulator is considered in the absence of holonomic constraint force. 
The control objective is to find the control algorithm being the framework of exact linearization, RISE and 
ADP technique enabling the position tracking control in manipulators control system as shown in Figure 1. 
ADP algorithm will be employed to implement optimal control design as desribed in next chapter. 












W. 
Controller 


t= M(q)q + Vin(4.9)4q + G(q) + F(q) + Ta 





Manipulator 


Figure 1. Control structure 


3. ADAPTIVE DYNAMIC PROGRAMMING APPROACH FOR A ROBOT MANIPULATOR 
3.1. ADP algorithm 

In [3], by using the control input (4) for manipulator (1) with nonlinear function (5) obtaining from 
(6)-(8), we lead to the nonlinear model (9): 


u=-T+h+T, (4) 
h=M(aré,)+C(ae,)+G(q)+ F(4) (5) 
e = -4 (6) 
e, =é +e, (7) 
r =È, +0,6, (8) 
t= fg oe 


$ | f( ) Pi i I ( ) os | 
X= X) = 4 E X) = 4 
where É , Oren SM CIS and =M 


Now, the control object is to design a control law u to guarantee not only stabilization (9) but also 
minimizing the quadratic cost function with infinite horizon as follows: 


V(x) = fr(xu)at 
0 (10) 


r(x,u)=QO(x)+u' Ru (11) 
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In which, Q(x) and R is positive definite function of *, symmetric definite positive 
matrix, respectively. 

This work presents a solution for approximate approach called adaptive dynamic programming 
(ADP) for optimal control design. In [21, 22], consider the following affine system. 


X= f(x)+g(x)u (12) 


where *"4S R p USR I and & (x) satisfy Lipschitz condition and Oe a 
The cost function is defined as (10). The next definition was given in [17, 18] to show that the 
optimal control solution will be considered in the set of admissible control. 


u(x); 


Definition 1: A control policy s defined as admissible policy if M(x) stabilize system (12) and 


u 
the equivalent value function a is finite. P(x ) is denoted set of admissible control policy. 


u(x) 


For any admissible policy , the nonlinear Lyapunov Equation (NLE) can be formulated 


r(x u(x)) (VA) (F) +8 (x)u())=0 


(13) 
Defining Hamilton function and optimal cost function as follows: 
T 
H (x, 4, V, )=r(x u) + (V) (f (x)+g(x)u) (14) 
V*a&)= mef J r(x, w) 
We lead to the following HJB equation: 
0= min H(x, u,V ) = H (x, 4 , V.) 
neP(z) (15) 


It can be noticed that, ” is optimal policy corresponding with the optimal cost function and 


UN 
A(x, u, V; )=0 with any admissible policy is NLE. 


Now, the optimal control policy can be obtained by taking the derivative of Hamilton problem with 
respect to policy H. 


+ E E E 
ee rs 
a hee) E 


This work present Policy Iteration (PI) algorithm for a robot manipulator including 2 steps 
as follows: 


u’ (x) 


Initiate admissible control policy l 
Repeat 
Step 1: Policy Evaluation 


Solve NLE for V corresponding given control policy B ; 


r(xal (E) Fea) (x) =0 (17) 


Step 2: Policy improvement 
Update new policy according to, 
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Where ™« is a number of limited iteration and ©” is an arbitrary given small positive number. 
This algorithm is considered in [21] that prove each policy control H is admissible control. 


The cost function V was reduced at each step until converge to optimal policy and i converge toward 
optimal policy as well. 

However, the nonlinear Lyapunov (17) is hard to solve directly. Therefore, in recent years, 
finding an indirectly way to solve this equation has been concerned by many researches [20-25]. In the next 
steps, two neural networks called Actor-Critic (AC) are trained simultaneously to solve approximately the 
HJB equation. 

The cost function and its associated policy can be represented by using a neural network (NN) 
as follows, 


Vi =W' d(x) +e, 


u = -R"'g" (VA) W +e, 


(19) 


Where, p(x) is corresponding function of NN that usually being selected as polynomial, Gausses, 


O 
sigmoid function and so on. V is denoted Vox 
Approximated optimal cost function and optimal policy are presented: 


V =W" d(x) 
^ ] ab ee A TZA 
û =-—R'g" (V) W, 
2 (20) 


Note that, to approximate HJB solution, we need to find only term Wo, However, to stabilize 


closed-loop system, both Wa, W. are employed, which leads to the flexibility that can help handling the 
stability of system in learning process. 

By replacing the optimal policy and the optimal cost function and by Actor-Critic networks in HJB 
(17), HJB error can be obtained. 


Q(x)+û Ru +W'VO(f(x)+ g (x)ûâ) = Enjp 


(21) 
l VT T r T l 1 T r 
Oa Wa Ve GVØW, +W. VO T= ER 8 VOW, |= Erp 
(22) 
T pal 
Where Gag 8 ; 
The tuning law for W. is described as follows, 
A w 
W. = -nc ———— 6, 
É m 1+vÆæ To nib (23) 
T 
l+vol o (24) 
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y= = 7 . e . 5 V : . 
P = Oe Where !* is resetting time. To avoid slow convergence on W. , the matrix I is 


T(0) 


considered with default matrix when minimum eigenvalue of | reach a given small positive number. 


=V¢ + 
5 (F(x) s (x)u) and 1+v¥@'T'@ is normalization factor. 
To make sure the convergence of W. with update law (24), O must satisfy the Persistence 
Excitation (PE) condition [21]. 


to +T 
ul > | y(r (t) dt > u1 
: (25) 


for several positive numbers ma oe T., 


y(t) = 2 __ 
Where 1l+vo'Ta | 


Aw Aw 


On the other hands, (22) is nonlinear equation of Wa, Therefore, the tuning law for Wa 1S 


ED) 


formulated based on GD algorithm to minimize the cost ( 


ae VAGUE (P. -V)em 1a (0-0)! 


W. = proj l-a = 
1l+oa0o 


(26) 


Where P" +} is a projection operator [22] that ensure the boundedness of updatation law. 


Note that, these parameters of both two NN’s update law 1c À Mar 142 must be selected to satisfy 
some conditions [22] to ensure stability of closed-loop system. One can also find the complete proof of 
convergence of parameters and stability of system in [22]. 


3.2. RISE feedback control design 
In [3], the control term p(t) is designed based on the RISE framework as follows: 


u(t) Ê (k, +e, (t)— (k, +De, (0) + v(t) 


(27) 
Where DIE R: is described as: 
v=(k, +D e, + Bsgn(e, ) (28) 
k, ER is positive constant control gain, and P ER can be selected being a positive control gain 


selected according to the following sufficient condition, 


AA 
A (29) 


Remark 1: It is different from the work in [3], in our work the ADP algorithm is presented to find 
the intermediate optimal control input in the absence of dynamic uncertainty. Furthermore, ADP technique 
was considered in [20-26] was still not to apply for a robotic manipulator. 

Remark 2: In compare with the work of Dixon [3] that design optimal control solving Riccati 


equation, this work requires partial knowledge of manipulator’s dynamic including matrices M,C. 
However, using the ADP approach, the optimal control problem is addressed in general case for any given 
cost function as (10) without constraint. 
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4. OFFLINE SIMULATION RESULTS 
Consider the offline simulation of a two-link manipulator control system using ADP technique and 
RISE algorithm. 


The general dynamic of two-link manipulator is represented by (1) with 


[5 + 2cos(q,) 1+ cos(q,) A —ġ, sin(q,) —(4, + 4,)sin(q,) 
| 1+ cos(q,) 1 L @sin(q,) 0 


amoa aes ad | ae 


: 
cos(g, +q) F =-0. lsign(q) í 0.1cos(z) | 


Q(x) = x Qx 


Value function is (10) with the term: 


o, -| 2 Q, ga” 2 mT 4 r _[4 o] ,_[025 0 
"10, Q, € f2 40 PARA 2 JO 4 | oO 0.25 
pe A 
a= 
10.6 10.4 
qı =[0 0f 


Without loss of generality, the set-point is selected as , initial state is 
qo =[0.1598 0.2257] 


The optimal value function which is solved directly in [3] is 


0 M 


nxn 


Vv" = x Bay O sen 


| = 2x7 — 4x +3x,x, +2.5xX +X cos(x, ) +X, +x, +0.5x3x, cos(x, ) 


. . 
A w 


The updatation law of W. and Wa are represented in (23) and (26) with, 


1). = 800, y=, TO) = 100, Er = 0.001, Na = 0.01, 19 —] 


NN activation function is selected as, 


T 
ox) =| x X, xX% X a cos(x) X GX ek, cos(x, ) | 


W=|2 4 3 25 1 1 1 0.5] 


The optimal parameter that is obtained by solving directly 


HJB as shown in [3]. Figures (1) and (2) show the convergence of We. Wa, The value of W. after 110s is 


[2 ee EE 0.5] To satisfy PE condition as in (25), a probing signal is added in system 


input. Moreover, system’s error evolution is shown in Figure (3) determining the stability of control system 
and state’s evolution as shown in Figure 4. 
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Critic parameters — Actor parameters 
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Figure 2. Convergence of critic’s parameters Figure 3. Convergence of actor’s parameters 
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Figure 4. State’s evolution 


5. CONCLUSION 

This paper mentioned the problem of optimal control design for a manipulator in combination with 
RISE and exact linearization. With the ADP technique, the solution of HJB equation was found by iteration 
algorithm to obtain the controller satisfying not only the convergence of weight but also the position tracking. 
Offline simulations were implemented to validate the performance and effectiveness of the optimal control 
for manipulators. 
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